# Zero-shot Video Retrieval
Llave 7B
Apache-2.0
LLaVE-7B is a 7-billion-parameter multimodal embedding model based on LLaVA-OneVision-7B, capable of embedding representations for text, images, multiple images, and videos.
Multimodal Fusion
Transformers English

L
zhibinlan
1,389
5
Llave 2B
Apache-2.0
LLaVE-2B is a 2-billion-parameter multimodal embedding model based on Aquila-VL-2B, featuring a 4K token context window and supporting embeddings for text, images, multiple images, and videos.
Text-to-Image
Transformers English

L
zhibinlan
20.05k
45
Llave 0.5B
Apache-2.0
LLaVE is a multimodal embedding model based on the LLaVA-OneVision-0.5B model, with a parameter scale of 0.5B, capable of embedding text, images, multiple images, and videos.
Multimodal Fusion
Transformers English

L
zhibinlan
2,897
7
Featured Recommended AI Models